Retrieval of mandarin broadcast news using spoken queries

نویسندگان

  • Berlin Chen
  • Hsin-Min Wang
  • Lin-Shan Lee
چکیده

Considering the monosyllabic structure of the Chinese language, a whole class of indexing features for retrieval of Mandarin broadcast news using syllable-level statistical characteristics has been previously investigated. This paper presents the improvements achieved over the previous results. The major differences are: (1) Multi-scale characterand word-level indexing terms have been integrated with the syllable-level information. (2) Information cues from the contemporary newswire text corpus have been used to create more accurate syllable indexing terms. (3) Automatic document expansion, blind relevance feedback, and query expansion via the term association matrix have been applied in retrieval. With all these schemes, the average precision can be improved from 55.46% to 71.29%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval

This paper presents a method of document expansion using a side collection for improving the overall performance in retrieving spoken documents using text queries. This method is applied to Chinese spoken document retrieval (SDR) tasks where a series of experiments have been carried out for both monolingual and cross-language SDR systems. In our monolingual retrieval experiments, Cantonese broa...

متن کامل

Mandarin Chinese Broadcast News Retrieval and Summarization Using Probabilistic Generative Models

This paper presents our recent research work on applying probabilistic generative models to Mandarin Chinese broadcast news retrieval and summarization. Most models can be trained in either a supervised or unsupervised manner. In addition, both literal term matching and concept matching strategies have been intensively investigated. This paper also presents a prototype web-based Mandarin Chines...

متن کامل

Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics

Spoken document retrieval has been extensively studied in recent years because of its high potential in various applications in the near future. Considering the monosyllabic structure of Chinese language, a whole class of indexing features for retrieval of spoken documents in Mandarin Chinese using syllable-level statistical characteristics has been studied, and very encouraging experimental re...

متن کامل

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval

This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of EnglishMandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-based retrieval model). English textual queries ar...

متن کامل

Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese

Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multi-media collections in the near future. Considering the characteristics and monosyllabic structure of the Chinese language, the syllable-based indexing for retrieval of spoken documents in Mandarin Chinese has been investigated, and extensive experiments on retrieval...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000